Efficient Computation of Partial-Support for Mining Interesting Itemsets

نویسندگان

  • Ardian Kristanto Poernomo
  • Vivekanand Gopalkrishnan
چکیده

Mining interesting itemsets is a popular topic in the data mining community. The objective of this problem is to mine all interesting itemsets, with respect to a given interestingness measure. While considerable efforts have being spent on justifying the various interestingness measures, the algorithms that mine them are not quite well-studied, except in the case support, which has resulted in the famous frequent itemset mining (FIM) problem. In this paper, we show that a certain class of interesting itemsets can be represented by functions of their partial support. This class includes some definitions of fault-tolerant itemsets, estimated support of itemsets in noisy data, and bond of itemsets. As the name implies, partial support of an itemset is the number of transactions containing some part of the given itemset. This paper addresses the problem of efficiently calculating partial supports, which leads to efficient algorithms for mining interesting itemsets in that class. We show that there exists a recurrence relation between partial supports. Hence, we can calculate the partial supports of itemset by simply extending any FIM algorithm (even the implementation). This allows us to benefit from innovations and optimizations in FIM algorithms. Theoretical analysis shows that our approaches retain the running time complexity of the base FIM algorithms for only a small cost in space. Extensive experiments on several real-world datasets also demonstrate that algorithms based on our approach are significantly faster than previously proposed techniques for corresponding definitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

In this work we study the mining of top-K frequent closed itemsets, a recently proposed variant of the classical problem of mining frequent closed itemsets where the support threshold is chosen as the maximum value sufficient to guarantee that the itemsets returned in output be at least K. We discuss the effectiveness of parameter K in controlling the output size and develop an efficient algori...

متن کامل

Depth-First Non-Derivable Itemset Mining

Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collecti...

متن کامل

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

The issue of maintaining privacy in frequent itemset mining has attracted considerable attentions. In most of those works, only distorted data are available which may bring a lot of issues in the datamining process. Especially, in the dynamic update distorted database environment, it is nontrivial to mine frequent itemsets incrementally due to the high counting overhead to recompute support cou...

متن کامل

Fast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support

Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting res...

متن کامل

Fast Vertical Mining Using Boolean Algebra

The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scanning database many times like Apriori algorithm. In vertical mining, frequent itemsets can be represented as a set of bit vectors in memory, which enables for fast computation. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009